unsolved problem
A New AI Math Startup Just Cracked 4 Previously Unsolved Problems
Axiom says its AI found solutions to several long-standing math problems, a sign of the technology's steadily advancing reasoning capabilities. Five years ago, mathematicians Dawei Chen and Quentin Gendron were trying to untangle a difficult area of algebraic geometry involving differentials, elements of calculus used to measure distance along curved surfaces . While working on one theorem, they ran into an unexpected roadblock: Their argument depended on a strange formula from number theory, but they were unable to solve or justify it. In the end, Chen and Gendron wrote a paper presenting their idea as a conjecture, rather than a theorem. Chen recently spent hours prompting ChatGPT in the hopes of getting the AI to come up with a solution to the still unsolved problem, but it wasn't working.
DeepMind and OpenAI claim gold in International Mathematical Olympiad
Experimental AI models from Google DeepMind and OpenAI have achieved a gold-level performance in the International Mathematical Olympiad (IMO) for the first time. The companies are hailing the moment as an important milestone for AIs that might one day solve hard scientific or mathematical problems, but mathematicians are more cautious because details of the models' results and how they work haven't been made public. The IMO, one of the world's most prestigious competitions for young mathematicians, has long been seen by AI researchers as a litmus test for mathematical reasoning that AI systems tend to struggle with. After last year's competition held in Bath, UK, Google DeepMindannounced that AI systems it had developed, called AlphaProof and AlphaGeometry, had together achieved a silver medal-level performance, but its entries weren't graded by the competition's official markers. Before this year's contest, which was held in Queensland, Australia, companies including Google, Huawei and TikTok-owner ByteDance, as well as academic researchers, approached the organisers to ask whether they could have their AI models' performance officially graded, says Gregor Dolinar, the IMO's president.
The Impossible Test: A 2024 Unsolvable Dataset and A Chance for an AGI Quiz
This research introduces a novel evaluation framework designed to assess large language models' (LLMs) ability to acknowledge uncertainty on 675 fundamentally unsolvable problems. Using a curated dataset of graduate-level grand challenge questions with intentionally unknowable answers, we evaluated twelve state-of-the-art LLMs, including both open and closed-source models, on their propensity to admit ignorance rather than generate plausible but incorrect responses. The best models scored in 62-68% accuracy ranges for admitting the problem solution was unknown in fields ranging from biology to philosophy and mathematics. We observed an inverse relationship between problem difficulty and model accuracy, with GPT-4 demonstrating higher rates of uncertainty acknowledgment on more challenging problems (35.8%) compared to simpler ones (20.0%). This pattern indicates that models may be more prone to generate speculative answers when problems appear more tractable. The study also revealed significant variations across problem categories, with models showing difficulty in acknowledging uncertainty in invention and NP-hard problems while performing relatively better on philosophical and psychological challenges. These results contribute to the growing body of research on artificial general intelligence (AGI) assessment by highlighting the importance of uncertainty recognition as a critical component of future machine intelligence evaluation. This impossibility test thus extends previous theoretical frameworks for universal intelligence testing by providing empirical evidence of current limitations in LLMs' ability to recognize their own knowledge boundaries, suggesting new directions for improving model training architectures and evaluation approaches.
Machine Learning Safety: Unsolved Problems - KDnuggets
Along with researchers from Google Brain and OpenAI, we are releasing a paper on Unsolved Problems in ML Safety. Due to emerging safety challenges in ML, such as those introduced by recent large-scale models, we provide a new roadmap for ML Safety and refine the technical problems that the field needs to address. As a preview of the paper, in this post, we consider a subset of the paper's directions, namely withstanding hazards ("Robustness"), identifying hazards ("Monitoring"), and steering ML systems ("Alignment"). Robustness research aims to build systems that are less vulnerable to extreme hazards and adversarial threats. Two problems in robustness are robustness to long tails and robustness to adversarial examples.
AI is helping tackle one of the biggest unsolved problems in maths
Artificial intelligence's ability to sift through large amounts of data is helping us tackle one of the most difficult unsolved problems in mathematics. Yang-Hui He at the City, University of London in the UK and colleagues are using the help of machine learning to better understand the Birch and Swinnerton-Dyer conjecture, one of the seven fiendishly difficult Millennium Prize Problems, which offers a million-dollar reward for the first correct solution to each.
Unsolved Problems in Machine Learning
I am actually not even aware of any machine learning (ML) problem that is considered to have been solved recently or in the past. This tells you a lot about how hard things really are in ML. Of course, if you read media outlets, it may seem like researchers are sweeping the floor clean with deep learning (DL), solving ML problems one after the other leaving no stones unturned. In reality, they are not, researchers actually attack relatively simpler problems in the hope of collectively solving the bigger problems, that is just how research works. You can see DeepMind aim is to "solve intelligence and make the world a better place" but they are busy building game playing algorithms.
Commonsense Reasoning Using WordNet and SUMO: a Detailed Analysis
รlvez, Javier, Gonzalez-Dios, Itziar, Rigau, German
We describe a detailed analysis of a sample of large benchmark of commonsense reasoning problems that has been automatically obtained from WordNet, SUMO and their mapping. The objective is to provide a better assessment of the quality of both the benchmark and the involved knowledge resources for advanced commonsense reasoning tasks. By means of this analysis, we are able to detect some knowledge misalignments, mapping errors and lack of knowledge and resources. Our final objective is the extraction of some guidelines towards a better exploitation of this commonsense knowledge framework by the improvement of the included resources.
Kirk Borne โ Analytics Visionary, Space Scientist, and Chronic Learner โ Humans of Analytics
It was October 2001, one month after the tragic terrorist attacks on 9-11-2001. I was sitting in my NASA office when the phone rang. The voice on the other end of the call said, "We would like you to brief the President tomorrow on data mining." I remember clearly my response: "Do you mean THE President?" Yes, they did mean the President of the United States.
Charles W. Bachman
Charles William "Charlie" Bachman, the "father of databases" who received the ACM A.M. Turing Award for 1973 for creating the first database management system, died June 13 at the age of 92. Born in Manhattan, KS, in 1924, Bachman earned his B.S. in mechanical engineering in 1948, as well as an M.S. in mechanical engineering from the University of Pennsylvania. He went to work for Dow Chemical in 1950, using mechanical punched-card computing devices to solve networks of simultaneous equations representing data from Dow plants. In 1957, Bachman became head of Dow's Data Processing Department, through which he became a member of Share Inc., and a founding member of the Share Data Processing Committee. In 1960, Bachman joined the General Electric (GE) Production Control Services Group in New York City, using a factory in Philadelphia to test designs for a system to automate factory planning, scheduling, operational control, and inventory control.